24 - Pattern Recognition [PR] - PR 20 [ID:23066]
50 von 170 angezeigt

Welcome everybody to pattern recognition. So today we want to look into multilayer perceptrons

that are also called neural networks. So we'll give a brief sketch of the ideas of neural

networks.

Okay so let's have a look at multilayer perceptrons. You see that we talk about this here only

about very basic concepts. If you're interested in neural networks we have an entire class

on deep learning where we talk about all the details. So here we will stay rather on the

surface. You may know that neural networks are extremely popular because they also have

this physiological motivation and we've seen that the perceptron is essentially computing

a sum of elements that go in. They are weighted in an inner product and some bias and you

could say that this has some relation to neurons because neurons are connected with accents

to other neurons and they are essentially getting the electrical activations from those

other neurons. They are collecting them and once the inputs are greater than a certain

threshold then the neuron is activated and you typically have this zero or one response

so it's either activated or not and it doesn't matter how strong the actual activation is.

If you are above the threshold then you have an output. If you're not there's simply no

output. Now you have these neurons and we don't talk about biological ones here but

we will talk about the mathematical ones based on the perceptron and then we can go ahead

and arrange them in layers and layers on top of each other and we essentially have some

input neurons where we simply have the input feature vector and some bias that we're here

indicating with one and this is then passed in a fully connected approach. So we are essentially

connecting everything with everything and we have then hidden layers and they're hidden

because we somehow cannot observe what is really happening with them. We can only observe

that if we have a given input sample and we know the weights then we can actually compute

what is happening there. If we don't have that then generally we don't see what is happening

but we only see the output at the very end and the output then is observable again and

we have typically a desired output and this desired output can then actually be used to

compare this to the output of the network which allows us then to construct a training

procedure. Note that we are not only doing sums of the input elements that are weighted

but what's also very important is this non-linearity. So we kind of need to model this all or none

response function and we've seen that Rosenblatt originally was using the step function. Of

course we could also use linear functions but if we were using linear functions we will

see that towards the end of this video then everything would essentially collapse down

to a single big matrix multiplication. So actually in every layer if they are fully

connected then you're essentially computing a matrix multiplication of the activations

of the previous layer with the next one. So this can be modeled simply as a matrix. Now

what is not modeled typically as a matrix is the activation function and the activation

function is applied element wise. The step function was this approach as Rosenblatt did

it but in later approaches and classical approaches the following two functions the sigmoid function

so the logistic function was very commonly used and as an alternative also the hyperbolic

tangent was also used because it has some advantages with respect to the optimization.

So we can now write down our units of these networks as essentially sums over the previous

layers. So we have some yi that is the output of the previous layer so we indicate this

here with l minus one. This is then multiplied with some wij and we also have this bias w0j

in the current layer l and this sum is essentially constructing the output already but the net

output is then also run through the activation function f and f is one of the choices that

we see above here so this is introducing the non-linearity that is then producing the output

of the layer l in the respective neuron. Now you want to be able to train this and this

is typically done using the back propagation algorithm. This is a supervised learning procedure

and backpropagation helps you to compute the gradients. So backpropagation is actually

not the learning algorithm itself but it's the way of computing the gradient. Here we

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:19:20 Min

Aufnahmedatum

2020-11-08

Hochgeladen am

2020-11-08 13:47:16

Sprache

en-US

In this video, we have a short introduction to the multi-layer perceptron.

This video is released under CC BY 4.0. Please feel free to share and reuse.

For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.

Music Reference: Damiano Baldoni - Thinking of You

Einbetten
Wordpress FAU Plugin
iFrame
Teilen